Core Architectures & Access Models: The Transformer's Reign and the Openness Divide
The Transformer architecture, particularly its decoder-only variants, continues to be the dominant backbone for both open-source (e.g., Llama 3, Qwen2, Mistral) and closed-source (e.g., GPT-4o, Claude 3 Opus, Gemini 1.5 Pro) LLMs. The primary divergence lies in access and transparency. Open models, like Llama 3 (Llama Community License), Mistral 7B (Apache 2.0), or Qwen2 (Tongyi Qianwen LICENSE), generally provide access to model weights, and often inference/fine-tuning code, fostering community-driven innovation and scrutiny. Closed models maintain proprietary control over weights and detailed architectures, offering access primarily through APIs. This distinction profoundly shapes their roles as "Foundation Models," with open versions providing an inspectable and adaptable base, while closed ones offer powerful but more opaque platforms.
Taxonomy: Incumbents, Challengers, and the Spectrum of Openness
The LLM landscape is diverse, with models categorized by their development approach and access policies:
- Closed-Source Frontier Models: Examples: OpenAI GPT-4o (~1.8T est. params, MMLU ~88.7%), Anthropic Claude 3 Opus (MMLU ~86.8%), Google Gemini 1.5 Pro (MMLU ~85.9%). Strengths: Often lead in general benchmarks and multimodal capabilities, backed by extensive R&D and proprietary data. Limitations: Opacity, API costs, potential vendor lock-in, less direct customizability.
- Open-Source Flagship Models: Examples: Meta Llama 3.1 70B/405B (MMLU ~86.0% for 70B Instruct), Alibaba Qwen2 72B (MMLU ~79.5% Instruct), DeepSeek-LLM 67B (MMLU ~75.7% Base), Mistral Large (API-first, MMLU ~81.2%). Strengths: Rapidly improving performance, transparency (weights usually available), high customizability, strong community support. Limitations: Can trail absolute SOTA on some frontier tasks, resource-intensive to self-host/fine-tune largest variants.
- Open-Source Efficient & Specialized Models: Examples: Google Gemma 2 9B/27B, Microsoft Phi-3 series, Mistral 7B/8x7B (Mixtral), Qwen1.5 0.5B-14B. Strengths: Excellent performance-per-parameter, suitable for on-device/local deployment, lower inference costs, strong for specific tasks or as fine-tuning bases. Limitations: Lower raw capabilities compared to flagship models. (Many available via Hugging Face, Ollama).
The Expanding Realm of Agentic AI: Customization vs. Integrated Platforms
AI agents capable of planning, tool use, and executing multi-step tasks are increasingly built upon LLMs. Open models like Llama 3, Qwen, or Mistral offer high flexibility for developers to create bespoke agentic frameworks, enabling deep integration with custom tools and data. Closed platforms (e.g., OpenAI's Assistants API with function calling, Anthropic's tool use capabilities, Google's Vertex AI Agent Builder) provide more integrated and often polished agentic environments but may have restrictions on the level of control and customization. The choice influences innovation velocity, accessibility, and the types of agentic systems that can be developed.